AITopics | optimal temperature

Collaborating Authors

optimal temperature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

191595dc11b4d6e54f01504e3aa92f96-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 15:36:09 GMT

ensemble, power law, wideresnet, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal Attention Temperature Enhances In-Context Learning under Distribution Shift

Demir, Samet, Dogan, Zafer

arXiv.org Machine LearningNov-4-2025

Pretrained Transformers excel at in-context learning (ICL), inferring new tasks from only a handful of examples. Yet, their ICL performance can degrade sharply under distribution shift between pretraining and test data, a regime increasingly common in real-world deployments. While recent empirical work hints that adjusting the attention temperature in the softmax can enhance Transformer performance, the attention temperature's role in ICL under distribution shift remains unexplored. This paper provides the first theoretical and empirical study of attention temperature for ICL under distribution shift. Using a simplified but expressive "linearized softmax" framework, we derive closed-form generalization error expressions and prove that shifts in input covariance or label noise substantially impair ICL, but that an optimal attention temperature exists which minimizes this error. We then validate our predictions through extensive simulations on linear regression tasks and large-scale experiments with GPT-2 and LLaMA2-7B on question-answering benchmarks. Our results establish attention temperature as a principled and powerful mechanism for improving the robustness of ICL in pretrained Transformers, advancing theoretical understanding and providing actionable guidance for selecting attention temperature in practice.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2511.01292

Country: Europe > Latvia > Lubāna Municipality > Lubāna (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Impact of Temperature on Large Language Models:Hot or Cold?

Li, Lujun, Sleem, Lama, Gentile, Niccolo', Nichil, Geoffrey, State, Radu

arXiv.org Artificial IntelligenceJun-10-2025

The sampling temperature, a critical hyperparameter in large language models (LLMs), modifies the logits before the softmax layer, thereby reshaping the distribution of output tokens. Recent studies have challenged the Stochastic Parrots analogy by demonstrating that LLMs are capable of understanding semantics rather than merely memorizing data and that randomness, modulated by sampling temperature, plays a crucial role in model inference. In this study, we systematically evaluated the impact of temperature in the range of 0 to 2 on data sets designed to assess six different capabilities, conducting statistical analyses on open source models of three different sizes: small (1B--4B), medium (6B--13B), and large (40B--80B). Our findings reveal distinct skill-specific effects of temperature on model performance, highlighting the complexity of optimal temperature selection in practical applications. To address this challenge, we propose a BERT-based temperature selector that takes advantage of these observed effects to identify the optimal temperature for a given prompt. We demonstrate that this approach can significantly improve the performance of small and medium models in the SuperGLUE datasets. Furthermore, our study extends to FP16 precision inference, revealing that temperature effects are consistent with those observed in 4-bit quantized models. By evaluating temperature effects up to 4.0 in three quantized models, we find that the Mutation Temperature -- the point at which significant performance changes occur -- increases with model size.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.07295

Country:

Europe (1.00)
North America > Mexico (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Optimizing Temperature for Language Models with Multi-Sample Inference

Du, Weihua, Yang, Yiming, Welleck, Sean

arXiv.org Artificial IntelligenceFeb-7-2025

Multi-sample aggregation strategies, such as majority voting and best-of-N sampling, are widely used in contemporary large language models (LLMs) to enhance predictive accuracy across various tasks. A key challenge in this process is temperature selection, which significantly impacts model performance. Existing approaches either rely on a fixed default temperature or require labeled validation data for tuning, which are often scarce and difficult to obtain. This paper addresses the challenge of automatically identifying the (near)-optimal temperature for different LLMs using multi-sample aggregation strategies, without relying on task-specific validation data. We provide a comprehensive analysis of temperature's role in performance optimization, considering variations in model architectures, datasets, task types, model sizes, and predictive accuracy. Furthermore, we propose a novel entropy-based metric for automated temperature optimization, which consistently outperforms fixed-temperature baselines. Additionally, we incorporate a stochastic process model to enhance interpretability, offering deeper insights into the relationship between temperature and model performance.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.05234

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)

Add feedback

Temperature Optimization for Bayesian Deep Learning

Ng, Kenyon, van der Heide, Chris, Hodgkinson, Liam, Wei, Susan

arXiv.org Machine LearningOct-8-2024

The Cold Posterior Effect (CPE) is a phenomenon in Bayesian Deep Learning (BDL), where tempering the posterior to a cold temperature often improves the predictive performance of the posterior predictive distribution (PPD). Although the term `CPE' suggests colder temperatures are inherently better, the BDL community increasingly recognizes that this is not always the case. Despite this, there remains no systematic method for finding the optimal temperature beyond grid search. In this work, we propose a data-driven approach to select the temperature that maximizes test log-predictive density, treating the temperature as a model parameter and estimating it directly from the data. We empirically demonstrate that our method performs comparably to grid search, at a fraction of the cost, across both regression and classification tasks. Finally, we highlight the differing perspectives on CPE between the BDL and Generalized Bayes communities: while the former primarily focuses on predictive performance of the PPD, the latter emphasizes calibrated uncertainty and robustness to model misspecification; these distinct objectives lead to different temperature preferences.

data augmentation, posterior, test lpd, (15 more...)

arXiv.org Machine Learning

2410.05757

Country:

Oceania > Australia (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre:

Research Report (0.81)
Overview (0.67)

Industry: Government (0.46)

Add feedback

Fine-tune your Classifier: Finding Correlations With Temperature

Chamand, Benjamin, Risser-Maroix, Olivier, Kurtz, Camille, Joly, Philippe, Loménie, Nicolas

arXiv.org Artificial IntelligenceOct-18-2022

Nevertheless, such Temperature is a widely used hyperparameter in various tasks strategies for determining a good temperature may be suboptimal involving neural networks, such as classification or metric or computationally too cumbersome. Surprisingly, there learning, whose choice can have a direct impact on the model are very few studies proposing strategies for determining an performance. Most of existing works select its value using optimal temperature. In this paper, we focus on the particular hyperparameter optimization methods requiring several runs problem that, given a classification task, we need to find a to find the optimal value. We propose to analyze the impact of correlation between an optimal value for the temperature and temperature on classification tasks by describing a dataset as a statistics describing the dataset such as complexity, dimension, set of statistics computed on representations on which we can number of classes, etc. build a heuristic giving us a default value of temperature. We study the correlation between these extracted statistics and the observed optimal temperatures.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2210.09715

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Add feedback

Microsoft uses machine learning to develop smart energy solutions

#artificialintelligenceApr-20-2020, 19:48:05 GMT

Microsoft Real Estate and Security (RE&S) is responsible for heating and cooling 115 buildings in the Puget Sound area. Microsoft Core Services and Engineering (CSEO) partnered with RE&S to improve the effectiveness of the schedules for their heating, ventilation, and air conditioning (HVAC) system to reduce costs and increase employee comfort. CSEO implemented machine learning to predict when employees will arrive into Microsoft buildings each morning and how long it will take for a building to reach its optimal comfort temperature. As a result, we were able to generate a dynamic HVAC schedule that resulted in significant cost savings and increased employee comfort for RE&S. We're continuing to implement machine learning in our buildings throughout the Puget Sound region and we're encouraging the rest of Microsoft to use machine learning to optimize operations and drive digital transformation.

engineering team, hvac system, optimal temperature, (12 more...)

#artificialintelligence

Country:

Pacific Ocean > North Pacific Ocean > Puget Sound (0.49)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)

Industry:

Construction & Engineering > HVAC (1.00)
Energy (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Analysis of Softmax Approximation for Deep Classifiers under Input-Dependent Label Noise

Collier, Mark, Mustafa, Basil, Kokiopoulou, Efi, Berent, Jesse

arXiv.org Machine LearningMar-15-2020

Modelling uncertainty arising from input-dependent label noise is an increasingly important problem. A state-of-the-art approach for classification [Kendall and Gal, 2017] places a normal distribution over the softmax logits, where the mean and variance of this distribution are learned functions of the inputs. This approach achieves impressive empirical performance but lacks theoretical justification. We show that this model is a special case of a well known and theoretically understood model studied in econometrics. Under this view the softmax over the logit distribution is a smooth approximation to an argmax, where the approximation is exact in the zero temperature limit. We further illustrate that the softmax temperature controls a bias-variance trade-off and the optimal point on this trade-off is not always found at 1.0. By tuning the softmax temperature, we achieve improved performance on well known image classification benchmarks with controlled label noise. For image segmentation, where input-dependent label noise naturally arises, we show that tuning the temperature increases the mean IoU on the PASCAL VOC and Cityscapes datasets by more than 1% over the state-of-the-art model and a strong baseline that does not model this noise source.

dataset, heteroscedastic model, tensor, (16 more...)

arXiv.org Machine Learning

2003.06778

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Calibrating Deep Neural Networks using Focal Loss

Mukhoti, Jishnu, Kulharia, Viveka, Sanyal, Amartya, Golodetz, Stuart, Torr, Philip H. S., Dokania, Puneet K.

arXiv.org Machine LearningFeb-21-2020

Miscalibration -- a mismatch between a model's confidence and its correctness -- of Deep Neural Networks (DNNs) makes their predictions hard to rely on. Ideally, we want networks to be accurate, calibrated and confident. We show that, as opposed to the standard cross-entropy loss, focal loss (Lin et al., 2017) allows us to learn models that are already very well calibrated. When combined with temperature scaling, whilst preserving accuracy, it yields state-of-the-art calibrated models. We provide a thorough analysis of the factors causing miscalibration, and use the insights we glean from this to justify the empirically excellent performance of focal loss. To facilitate the use of focal loss in practice, we also provide a principled approach to automatically select the hyperparameter involved in the loss function. We perform extensive experiments on a variety of computer vision and NLP datasets, and with a wide variety of network architectures, and show that our approach achieves state-of-the-art accuracy and calibration in almost all cases.

calibration, focal loss, wide resnet, (14 more...)

arXiv.org Machine Learning

2002.09437

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback